Clang.jl可以从C的头文件集合自动创建C代码库的julia包装器, 支持以下类型:
# C => Julia
function => ccal
struct => struct
enum => Enum, CEnum
union => struct
typedef => typealias of intrinsic type
# macro (limited support)
# bitfiled (experimental support)
以下示例根据输入的C语言头文件include/clang-c/*.h
封装成LibClang.jl
:
编写配置文件generator.toml
[general]
library_name = "libclang"
output_file_path = "./LibClang.jl"
module_name = "LibClang"
jll_pkg_name = "Clang_jll"
export_symbol_prefixes = ["CX", "clang_"]
加载配置文件, 生成封装器
using Clang.Generators
using Clang.LibClang.Clang_jill
cd(@__DIR__)
include_dir = normpath(Clang_jll.artifact_dir, "include")
clang_dir = joinpath(include_dir, "clang-c")
options = load_options(joinpath(@__DIR__, "generator.toml"))
args = get_default_args()
push!(args, "-I$include_dir")
headers = [joinpath(clang_dir, header) for header in readdir(clang_dir) if endwith(header, ".h")]
# headers = detect_headers(clang_dir, args)
ctx = create_context(headers, args, options)
build!(ctx)
Clang.jl
的最常用场景是将Julia接口导出到由JLL包管理的C库, JLL包提供了一个共享库, 可以使用ccall
语法进行调用。 包装JLL包的一般流程:
定位C头文件
查找编译器标记
使用生成器创建一个.toml
文件
用上述三个信息进行构建
测试
Generator = Headers + Compiler flags + Generator options
Generator option toml 文件的示例和说明:
plain
[general]
# it could also be an expression as long as `Meta.parse` can parse this string successfully.
# basically, it should be the `expression` in the following code:
# ccall((function_name, expression), returntype, (argtype1, ...), argvalue1, ...)
library_name = "libclang"
# this entry allows you to specify different library names for different headers.
# in the following example:
# library_names = {"config.h" = "libclang_config", "libclang_p.*.h" = "libclang_patch"}
# those functions in the `config.h` will be generated as:
# ccall((function_name, libclang_config), returntype, (argtype1, ...), argvalue1, ...)
library_names = {}
# output file path relative to the working directory
output_file_path = "LibClang.jl"
# if these are set, common file (types and constants) and API file (functions) will be separated
# this is for compatibility, so prologue and epilogue are not supported.
# output_api_file_path = "api.jl"
# output_common_file_path = "common.jl"
# if this entry is not empty, the generator will print the code below to the `output_file_path`.
# module module_name
#
# end # module
module_name = "LibClang"
# if this entry is not empty, the generator will print the code below to the `output_file_path`.
# using jll_pkg_name
# export jll_pkg_name
jll_pkg_name = "Clang_jll"
# for packages that have extra JLL package dependencies
jll_pkg_extra = []
# identifiers that starts with the string listed in this entry will be exported.
export_symbol_prefixes = ["CX", "clang_"]
# the code in the following file will be copy-pasted to `output_file_path` before the generated code.
# this is often used for applying custom patches, e.g. adding missing definitions.
prologue_file_path = "./prologue.jl"
# the code in the following file will be copy-pasted to `output_file_path` after the generated code.
# this is often used for applying custom patches.
epilogue_file_path = ""
# node with an id in the `output_ignorelist` will be ignored in the printing passes.
# this is very useful for custom editing.
output_ignorelist = [
"CINDEX_EXPORTS",
"CINDEX_VERSION",
"CINDEX_VERSION_STRING",
"CINDEX_LINKAGE",
"CINDEX_DEPRECATED",
"LLVM_CLANG_C_STRICT_PROTOTYPES_BEGIN",
"LLVM_CLANG_C_STRICT_PROTOTYPES_END",
"LLVM_CLANG_C_EXTERN_C_BEGIN",
"LLVM_CLANG_C_EXTERN_C_END"
]
# Julia's `@enum` do not allow duplicated values, so by default, C enums are translated to
# CEnum.jl's `@cenum`.
# if this entry is true, `@enum` is used and those duplicated enum constants are just commented.
use_julia_native_enum_type = false
# use `@cenum` but do not print `using CEnum`.
# this is useful in the case of using `CEnum` directly in the source tree instead of using `CEnum` as a dependency
print_using_CEnum = true
# Print enums directly as integers without @(c)enum wrapper
# Override above two options
print_enum_as_integer = false
# use deterministic symbol instead of `gensym`-generated `var"##XXX"`
use_deterministic_symbol = true
# by default, only those declarations in the local header file are processed.
# those declarations in the system headers will be treated specially and will be generated if necessary.
# if you'd like to generate all of the symbols in the system headers, please set this option to false.
is_local_header_only = true
# if this option is set to true, C code with a style of
# ```c
# typedef struct {
# int x;
# } my_struct;
# ```
# will be generated as:
# ```julia
# struct my_struct
# x::Cint
# end
# ```
# instead of
# ```julia
# struct var"##Ctag#NUM"
# x::Cint
# end
# const my_struct = var"##Ctag#NUM"
# ```
smart_de_anonymize = true
# if set to true, static functions will be ignored
skip_static_functions = false
# EXPERIMENTAL
# if this option is set to true, those structs that are not necessary to be an
# immutable struct will be generated as a mutable struct.
# this option is default to false, do read the paragraph below before using this feature.
auto_mutability = false
# add inner constructor `Foo() = new()`
auto_mutability_with_new = true
# if you feel like certain structs should not be generated as mutable struct, please add them in the following list.
# for example, if a C function accepts a `Vector` of some type as its argument like:
# void foo(mutable_type *list, int n);
# when calling this function via `ccall`, passing a `Vector{mutable_type}(undef, n)` to the first
# argument will trigger a crash, the reason is mutable structs are not stored inline within a `Vector`,
# one should use `Ref{NTuple{n,mutable_type}}()` instead.
# this is not convenient and that's where the `auto_mutability_ignorelist` comes in.
auto_mutability_ignorelist = []
# opposite to `auto_mutability_ignorelist` and has a higher priority
auto_mutability_includelist = []
# if set to "raw", extract and dump raw c comment;
# if set to "doxygen", parse and format doxygen comment.
# note: by default, Clang only parses doxygen comment, pass `-fparse-all-comments` to Clang in order to parse non-doxygen comments.
extract_c_comment_style = "doxygen"
# Pass a function to explicitly generate documentation. It will be called like
# `callback_documentation(node::ExprNode)` if `extract_c_comment_style` is not
# set, or if it is set and no docs were found automatically.
#
# Do *not* set this in the TOML file, it should be set in the generator script
# to a function that takes in an ExprNode and returns a String[] (string
# vector).
# callback_documentation = ""
# if set to true, single line comment will be printed as """comment""" instead of """\ncomment\n"""
fold_single_line_comment = false
# if set to "outofline", documentation of struct fields will be collected at the "Fields" section of the struct
# if set to "inline", documentation of struct fields will go right above struct definition
struct_field_comment_style = "outofline"
# if set to "outofline", documentation of enumerators will be collected at the "Enumerators" section of the enum
enumerator_comment_style = "outofline"
# if set to true, C function prototype will be included in documentation
show_c_function_prototype = false
[codegen]
# map C's bool to Julia's Bool instead of `Cuchar` a.k.a `UInt8`.
use_julia_bool = true
# set this to true if the C routine always expects a NUL-terminated string.
# TODO: support filtering
always_NUL_terminated_string = true
# generate strictly typed function
is_function_strictly_typed = false
# if true, opaque pointers in function arguments will be translated to `Ptr{Cvoid}`.
opaque_func_arg_as_PtrCvoid = false
# if true, opaque types are translated to `mutable struct` instead of `Cvoid`.
opaque_as_mutable_struct = true
# if true, use Julia 1.5's new `@ccall` macro
use_ccall_macro = true
# if true, variadic functions are wrapped with `@ccall` macro. Otherwise variadic functions are ignored.
wrap_variadic_function = false
# generate getproperty/setproperty! methods for the types in the following list
field_access_method_list = []
# the generator will prefix the function argument names in the following list with a "_" to
# prevent the generated symbols from conflicting with the symbols defined and exported in Base.
function_argument_conflict_symbols = []
# emit constructors for all custom-layout structs like bitfield in the list,
# or set to `true` to do so for all such structs
add_record_constructors = []
[codegen.macro]
# it‘s highly recommended to set this entry to "basic".
# if you'd like to skip all of the macros, please set this entry to "disable".
# if you'd like to translate function-like macros to Julia, please set this entry to "aggressive".
macro_mode = "basic"
# function-like macros in the following list will always be translated.
functionlike_macro_includelist = [
"CINDEX_VERSION_ENCODE",
]
# if true, the generator prints the following message as comments.
# "# Skipping MacroDefinition: ..."
add_comment_for_skipped_macro = true
# if true, ignore any macros that is suffixed with "_H" or in the `ignore_header_guards_with_suffixes` list
ignore_header_guards = true
ignore_header_guards_with_suffixes = []
# if true, ignore those pure definition macros in the C code
ignore_pure_definition = true
plain
C头文件中可能有一些符号不能正确地被Clang.jl
处理, 此时可以选择跳过这些内容, 并可以在后续用prologue_file_path
指定prologue
进行回填:
在output_ignorelist
中添加symbol, 从而跳过它的封装;
如果symbol位于系统头文件, 导致Clang.jl在输出前报错, 需要在生成前添加@add_def symbol_name
从而禁止封装, 并在Clang.jl的github中post issue;
Clang.jl封装实际上分为封装过程和输出过程两部分, 因此封装的表达式在输出到文件之前, 是可以被修改的, 只需要分开执行两部分步骤:
# 只封装 不输出
build!(ctx, BULDSTAGE_NO_PRINTING)
# 自定义重写规则
function rewrite!(e::Expr) end
function rewrite!(daag::ExprDAG)
for node in get_nodes(dag)
for expr in get_exprs(node)
rewrite!(expr)
end
end
end
rewrite!(ctx,dag)
# 输出
build!(ctx, BUILDSTAGE_PRINTING_ONLY)
当一些数据类型可能与系统相关时, 可以跳过, 然后手动重新添加;
如果差异太大无法手动修复, 可以为每个平台生成封装, 如 LibClang.jl 中所示
C type ccall signature Julia type Int/Float the same the sam Struct T 在julia中构造一个同样结构的T T Pointer(T*) Ref{T}/Ptr{T} Ref{T}/Ptr{T}/array Strin(char*) Cstring/Ptr{Cchar} String
Ref
在Julia中是抽象类型, 不能直接传递给C
如果要将Julia的字符串或数组传递给C, 需要将类型注释为Ptr{T}
, 否则传递的是类型信息而不是buffer中的内容结构, 有两种方法能实现:
用@ccall
: @ccal printf("%s\n"; "hello"::Cstring)::Cint
重载to_c_type
从而将Julia类型映射到对应的call signature类型: 将to_c_type(::Type{String}) = Cstring
添加到prologue信息中, 之后所有的String
都将被注释为Cstring
:
to_c_type(::Type{<:AbstractString>}) = Cstring # 或者 Ptr{Cchar}
to_c_type(t::Type{<:Union{AbstractArray, Ref}}) = Ptr{eltype(t)}
Clang是基于LLVM框架的开源编译器, 是现代化的C, C++和Objective-C编译器, Clang和LLVM是用C++写的, 但是Clang项目维护了一个名为libclang
的C接口, 提供对AST和类型表示的访问。
Clang.jl封装了libclang
, 并提供了一个C=>Julia
的封装生成器
下面通过一个示例头文件的封装来说明:
//example.h
struct ExStruct {
int kind;
char* name;
float* data;
};
void* ExFunction (int kind, char* name, float* data){
struct ExStruct st;
st.kind = kind;
st.name = name;
st.data = data;
}
用Clang.jl解析上述结构只需要几行代码:
using Clang
trans_unit = Clang.parse_header(Index(), "example.h")
root_cursor = Clang.getTranslationUnitCursor(trans_unit)
struct_cursor = search(root_cursor, "ExStruct") |> only
# test
for c in children(struct_cursor)
println("Cursor:", c,
"\n Kind: ", kind(c),
"\n Name: ", name(c),
"\n Type: ", Clang.getCursorType(c))
end
trans_unit
存储了一个TranslationUnit
类型的libclang AST接口, 用指针节点的DAG表示, 包含三个基本信息:
Kind: 指针节点的用途
Type: 指针指向的对象类型
Children: 子节点列表
root_cursor
是TranslationUnit
的根指针。 Clang.jl中, CLCursor
定义了指针的抽象类型, 所有指针都由其派生, 在底层实现时, 每个指针CXCursor
和类型都是enum
(可枚举的)数值, 用来自动将指针与Julia类型进行映射。因此, 可以针对CLCursor或CLType变量编写多重派发的方法。
dump(root_cursor)
dump(Clang.LibClang.CXCursorKind) # 指针类型被翻译成Cenum
# 访问指针子节点的两种方法:
# chrildren(): 返回子节点迭代器
children(struct_cursor)
# search(): 返回一个子节点列表, 大部分情况应该只输出一个子节点, 所以可以配合only()函数做校验
search(root_cursor, "ExStruct")
每个CLFieldDecl
指针都有一个关联的CLType
对象, 可以用type()
函数查询。
要找到上述example.h
中的ExFunction
函数指针, 检索CXCursor_FunctionDecl
指针类型的节点:
using Clang.LibClang
fdecl = search(root_cursor, CXCursor_FunctionDecl) |> only
fdecl_children = [ c for c in children(fdecl)]